Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Flesh out Protocol with Peer and Metadata Requests, Specify block deduplication logic #355

Open
wants to merge 1 commit into
base: feat/graphsync-cbor
Choose a base branch
from

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Jan 22, 2021

Goals

This PR has a couple of goals (on top of the protocol rewrite for CBOR):

  • Add layers to the graphsync protocol to facilitate discovery -- specifically, only returning other peers that may have content and only return metadata about a request -- both of these were vaguely implied in the initial protocol description but not made clear.
  • Defining clearly the relationship between metadata and blocks (currently implied), and perhaps controversially defining the deduplication an implementation should support BY default (meaning all requestors must be able to consume responses with this kind of deduplication). The ambiguity in the relationship between what blocks are sent with each response leads to a protocol that is unclear to implement. By specifying the desired behavior, we make it clear what each implementation MUST support.

Implementation

  • Cancel and update boolean fields are compressed into an enumerated requestType field
  • Specify operations and expected response codes for Peer request and Metadata request
  • Define the rules regarding metadata and blocks in a normal graphsync request -- this has been ambiguous, bit is made clear: a block in a response must have a corresponding link in the metadata. Otherwise it's considered an invalid block and will be dropped.
  • Perhaps most importantly, we are REDUCING the scope of deduplication provided by default. The only kinds of deduplication supported by default are:
    • deduplicating within a request - if we traverse the same block twice in a single request, we don't send it twice
    • deduplicating in a message -- if two requests send the same block in the same message, only send it once
  • go-graphsync up till now has supported an additional type of deduplication: if two requests are being processed simultaneously, and the same block is encountered in both requests, but at different times so that it ends up in seperate network messages, it is STILL deduplicated. This has led to a LOT of implementation complexity. Ironically, this feature is disabled in the main production use case, in Filecoin, where we had to turn it off because it became too complicated to support in Filecoin's case of putting returned blocks in different blockstores.
  • I think it makes sense for additional forms of deduplication, which may lead to significant implementation complexity, to be left to extensions. This lowers the barrier for implementation in each new language, but still enables specialized use cases where more aggressive deduplication is needed. Note that we already have an example of this in the DoNotSendCids extension which is used to support resuming requests.

add metadata and peer only requests, specify block deduplication behavior
@warpfork
Copy link
Contributor

I have only skimmed this for syntax rather than gone deep on semiotics (e.g. whether this will actually make the protocol "better" in some way), but what I see at that level looks good to me.

@warpfork
Copy link
Contributor

Is there a way we could get this on a good trajectory for merge? Even if that might be by relabeling it as a wishlist for a future un-version-numbered development cycle?

There's good content in here, but I'm doing a cycle of being angsty about long-lived open PRs and would like to clean house :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants